Add an idempotency key to trace/metrics/logs Export*ServiceRequest by jmacd · Pull Request #219 · open-telemetry/opentelemetry-proto

jmacd · 2020-09-03T00:04:28Z

Adds an idempotency key as described in #218. This is to support AggregationTemporality=DELTA correctly in systems that ingest and store DELTA data points.

Fixes #218.

tigrannajaryan

I wonder if we want to add this to OTLP in general. Idempotency is a good property to prevent duplicate data and is a good companion for protocols/implementations which err on the retry side (which OTLP does via some nuanced sentences in the spec).

tigrannajaryan · 2020-09-03T16:17:45Z

+  //
+  // [V4 UUID]([RFC 4122](https://tools.ietf.org/html/rfc4122)) is the
+  // recommended idempotency key format.
+  string idempotency_key = 2;


I see you changed it from bytes to string. This is probably OK, but it is bigger on the wire. We could keep the bytes, so that it is typically 16 bytes of UUID on the wire instead of 36 bytes in string representation.

If it is a string then this comment probably needs to be re-worded since it refers to bytes and it is not clear what that means:

Values of size less than 8 bytes are invalid and MUST be ignored.

My slight preference is to keep it as bytes and recommend V4 UUID in the bytes.

OK. I have seen 8-byte keys in use, so I was hesitating to require a 16-byte key. I looked at the benchmarks in https://github.com/satori/go.uuid which suggest a UUIDv4 takes less than 1 microsecond to generate, which seems like not a big deal. So I've reworded this (in the three files) to say:

// OTLP clients SHOULD include an idempotency key. Empty // idempotency keys are invalid and MUST be ignored. A 128-bit V4 // UUID (https://tools.ietf.org/html/rfc4122) is the // recommended idempotency key format. bytes idempotency_key = 2;

I am fine with if you want to only recommend 16 byte UUIDs, but not require it. My concern was primarily around using bytes vs string as the data type.

tigrannajaryan · 2020-09-04T13:59:27Z

@jmacd I know I already gave my approval but I have some concerns. I have been thinking about this since yesterday and haven't managed to convince myself. Can you please help me understand?

If I understand correctly the problem this solves is the duplicate data on the server side. What I do not quite understand is how important it is to avoid the duplicates. Are we looking for a guaranteed lack of duplicates or only for a reduction in the number of duplicates?

The solutions as proposed works on the request level. For intermediary Collectors this has limited use. Collectors can (and often are in production) be behind a load balancer with no guarantees that retried requests go to the same Collector instance. This makes detection either impossible if we store the idempotency keys locally per instance or requires Collector instances to coordinate and exchange idempotency keys (which is complicated and I am not sure we will actually implement it in the Collector). If we don't implement idempotency on the receiving end of the Collector then I am not sure of the overall usefulness of this concept in a real production deployment of telemetry collection pipeline.

I am not sure if the solution in this PR seeks to prevent duplicates or to guarantee exactly-once delivery. If it is the former then we can possibly achieve it in a different way: by changing the retrying logic in OTLP to never retry in situations which may result in duplicates (particularly in all situations when the server does not explicitly tell that they did not receive the data). This can be an operation mode for OTLP (as opposed to the currently specified operation which aims for at-least-once). Obviously we will in this case increase the risk of data losses so it is a tradeoff.

If we are looking for exactly-once guarantees that's a different story and I don't think this proposal solves it.

In the presence of intermediary nodes and multiple delivery routes request-level measures that operate on a single hop of communication are not sufficient to implement delivery guarantees. Such guarantees likely require communicating ids/keys that travel together with telemetry data end-to-end, intact, through multiple hops of collection pipeline. This proposal does not provide the means for that.

jmacd · 2020-09-08T23:25:01Z

@tigrannajaryan this proposal is very limited, as you pointed out. Let me for the moment illustrate how I was imagining this can be used. Provided that the collector simply passes through each OTLP request, so that one request in equals one request out. The collector can modify the data (filter, rename, etc.), but the output corresponds to 100% of the input and would use the same idempotency key. The OTLP request will eventually be received by an endpoint that intends to store the data, at which point I presume the data from the original request will be split into individual units (e.g., metric data points, log events, spans) before writing. Each individual unit of data would keep the same idempontency key. The final destination of this data pipeline would then store an individual unit of data with the associated idempontency key. When another individual unit of data arrives it will go to the same storage location, at which point the storage layer can recognize a duplicate point by its idempotency key. Generally this results in inflated storage costs for recent data, and commonly this is addressed by rewriting data after the point that no new data will be accepted, dropping idempotency keys at that point in time.

The approach above works in the presence of multiple routes and a horizontally-scaled collector pool, but the one-request-to-one-request property limits what the collector can do, particularly its ability to correctly batch data.

To correct this problem in general could take any number of forms. This is where we could begin to coordinate an exchange idempotency keys between collectors. A more straight-forward system architecture would place a Kafka queue in between the producer of individual requests and the consumer that batches them. The consumer then has the proper state to deduplicate requests by their idempontency key. To correctly use collector pipeline that batches or joins requests in some way would require this sort of solution.

I am interested in this simple form of idempotency key because it makes simple things possible, while we could continue to study the ways that more general forms of idempotency can be arranged.

tigrannajaryan · 2020-09-09T17:59:48Z

Provided that the collector simply passes through each OTLP request, so that one request in equals one request out.

This is not how Collector works today. Collector is not designed to pass requests, it unpacks the requests at the receiving end and drops any portion of the request Protobuf message that is not the telemetry data itself. We don't have a good way of preserving the request idempotency key in the Collector pipeline.

The approach above works in the presence of multiple routes and a horizontally-scaled collector pool, but the one-request-to-one-request property limits what the collector can do, particularly its ability to correctly batch data.

Yes, it would work if Collector worked the way you described earlier, but it doesn't at the moment and I don't think we can make it work that way.

I understand that in some scenarios where the Collector is not present and you send directly to the backend your proposal will work and is valuable, but I wonder if that is good enough for your needs (or for needs of others who may be also interested in preventing duplicates). Do you envision that it will be common for you/your customers to not have a Collector?

bogdandrutu

Blocking this PR, because I feel we don't need it right now (maybe I am wrong), but as @tigrannajaryan mentioned things like batching in the collector will not work.

Can we work on this after GA? If it is extremely critical we can discuss more about this, but I like to set the mindset that if not extremely critical lets delay after GA.

jmacd · 2020-12-23T06:57:54Z

Further investigation into the Metrics part of OTLP leads me to think this is not needed after we specify a mandatory service.instance.id resource combined with timestamps in each data point.

Add an idempotency key to ExportMetricsServiceRequest

6aa4756

jmacd added the spec:metrics label Sep 3, 2020

jmacd requested a review from a team September 3, 2020 00:04

tigrannajaryan approved these changes Sep 3, 2020

View reviewed changes

Comment thread opentelemetry/proto/collector/metrics/v1/metrics_service.proto Outdated

jmacd added 2 commits September 2, 2020 22:34

Extend to all OTel

e6aeaf8

MUST

394c562

jmacd changed the title ~~Add an idempotency key to ExportMetricsServiceRequest~~ Add an idempotency key to trace/metrics/logs Export*ServiceRequest Sep 3, 2020

MrAlias approved these changes Sep 3, 2020

View reviewed changes

tigrannajaryan reviewed Sep 3, 2020

View reviewed changes

jmacd added 2 commits September 3, 2020 17:23

Use bytes, recommend V4 UUID, state that 0-length is invalid

5ceab53

Remove markdown format

c074fe1

bogdandrutu requested changes Sep 17, 2020

View reviewed changes

jmacd closed this Dec 23, 2020

tigrannajaryan mentioned this pull request Oct 11, 2022

Unique request id and other packet metadata in OTLP open-telemetry/opentelemetry-specification#2862

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an idempotency key to trace/metrics/logs Export*ServiceRequest#219

Add an idempotency key to trace/metrics/logs Export*ServiceRequest#219
jmacd wants to merge 5 commits intoopen-telemetry:masterfrom
jmacd:jmacd/idempotency

jmacd commented Sep 3, 2020 •

edited

Loading

Uh oh!

tigrannajaryan left a comment

Uh oh!

Uh oh!

tigrannajaryan Sep 3, 2020 •

edited

Loading

Uh oh!

jmacd Sep 4, 2020 •

edited

Loading

Uh oh!

tigrannajaryan Sep 4, 2020

Uh oh!

tigrannajaryan commented Sep 4, 2020

Uh oh!

jmacd commented Sep 8, 2020

Uh oh!

tigrannajaryan commented Sep 9, 2020

Uh oh!

bogdandrutu left a comment

Uh oh!

jmacd commented Dec 23, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jmacd commented Sep 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tigrannajaryan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tigrannajaryan Sep 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jmacd Sep 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tigrannajaryan Sep 4, 2020

Choose a reason for hiding this comment

Uh oh!

tigrannajaryan commented Sep 4, 2020

Uh oh!

jmacd commented Sep 8, 2020

Uh oh!

tigrannajaryan commented Sep 9, 2020

Uh oh!

bogdandrutu left a comment

Choose a reason for hiding this comment

Uh oh!

jmacd commented Dec 23, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jmacd commented Sep 3, 2020 •

edited

Loading

tigrannajaryan Sep 3, 2020 •

edited

Loading

jmacd Sep 4, 2020 •

edited

Loading